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Abstract 

Distributed adaptive networks achieve better estimation performance by exploiting temporal and as well spatial diversity 
while consuming few resources. Recent works have studied the single task distributed estimation problem, in which the nodes 
estimate a single optimum parameter vector collaboratively. However, there are many important applications where the multiple 
vectors have to estimated simultaneously, in a collaborative manner. This paper presents multi-task diffusion strategies based on 
the Affine Projection Algorithm (APA), usage of APA makes the algorithm robust against the correlated input. The performance 
analysis of the proposed multi-task diffusion APA algorithm is studied in mean and mean square sense. And also a modified 
multi-task diffusion strategy is proposed that improves the performance in terms of convergence rate and steady state EMSE as 
well. Simulations are conducted to verify the analytical results. 


I. Introduction 

Distributed adaptation over networks has emerged as an attractive and challenging research area with the advent of multi- 
agent( wireless or wireline) networks. Recent results in the field can be found in HI - [3 . In adaptive networks, the interconnected 
nodes continuously learn and adapt, as well as perform the assigned tasks such as parameter estimation from observations 
collected by the dispersed agents. Consider a connected network consisting of N nodes observing temporal data arising from 
different spatial sources with possibly different statistical profiles. The objective is to enable the nodes to estimate a parameter 
vector of interest, w op t from the observed data. In a centralized approach, the data or local estimates from all nodes would 
be conveyed to a central processor where they would be fused and the vector of parameters estimated. In order to reduce 
the requirement of powerful central processor and extensive amount of communications in a traditional centralized solution, 
a distributed solution is developed relying only on local data exchange and interactions between intermediate neighborhood 
nodes, while retaining the estimation accuracy of centralized solution. In distributed networks, the individual nodes share the 
computational burden so that communications are reduced as compared to the centralized network, and power and bandwidth 
usage are also there by reduced. Due to these merits, distributed estimation has received more attention recently and been widely 
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used in many applications, such as in precision agriculture, environmental monitoring, military surveillance, transportation and 
instrumentation. 

The mode of cooperation that is allowed among the nodes determines the efficiency of any distributed implementation. In 
incremental mode of cooperation, each node transfers information to its corresponding adjacent node in sequential manner using 
cyclic pattern of collaboration. This approach reduces communications between nodes and improves the network autonomy as 
compared the centralized solution. In practical wireless sensor networks, it may be more difficult to establish a cyclic pattern as 
required in the incremental mode of cooperation as the number of sensor nodes increase. On the other hand, in diffusion mode 
of cooperation, each node exchanges information with its neighborhood (i.e., the set of all its neighbors including itself), A4 as 
directed by the network topology. There exist several useful distributed strategies for sequential data processing over networks 
including consensus strategies 0-0. incremental strategies 0-0 and diffusion strategies m-m- Diffusion strategies exhibit 
superior stability and performance over consensus based algorithms ED- 

The existing literature on distributed algorithms shows that most works focus primarily on the case where the nodes estimate 
a single optimum parameter vector collaboratively. We shall refer to problems of this type as single-task problems. However, 
many problems of interest happen to be multi-task oriented i.e., consider the general situation where there are connected clusters 
of nodes, and each cluster has a parameter vector to estimate. The estimation still needs to be performed cooperatively across 
the network because the data across the clusters may be correlated and, therefore, cooperation across clusters can be beneficial. 
This concept is relevant to the context of distributed estimation and adaptation over networks. Initial investigations along these 
lines for the traditional diffusion strategy appear in p3]-[l9]- It is well known that in the case of a single adaptive filter, one 
major drawback of the LMS algorithm is its slow convergence rate for colored input signals and the APA algorithm is a better 
alternative to LMS is such an environment. For distributed networks, highly correlated inputs also deteriorate the performance 
of the multi-task diffusion-LMS (multi-task d-LMS) algorithm. In this paper we therefore focus on a new APA-based multi-task 
distributed learning scheme over networks to obtain a good compromise between convergence performance and computational 
cost and to analyze their performance in terms of mean-square error and convergence rate. 

II. Network Models and Multi Task learning 

Consider a network with N nodes deployed over a certain geographical area. At every time instant n, every node k has 
access to time realizations {dk{n), Ufc(n)} with dk{n) denoting a scalar zero mean reference signal and Ufc(ra) is an L x 1 
regression vector, Ufc(n) = [uk(n),Uk{n — 1), ...,Uk(n — L + 1)] T with covariance matrix R u ,k = E[ u k{n)u^ (n)]. The data 
at node k is assumed to be related via the linear measurement model: 

d k (n) = life (n) Wfc + e fc (n) ( 1) 

where wj is an unknown optimal parameter vector to be estimated at node k and e k (n) is an observation noise with variance £° 
which is assumed to be zero mean white noise and also independent of Uk{n ) for all k. Considering the number of parameter 
vectors to be estimated, which we shall refer to as the number of tasks, the distributed learning problem can be single-task or 
multi-task oriented. Therefore we distinguish among the following three types of networks, as illustrated by Fig. 1, depending 
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(a) (b) (c) 

Figure 1: Three types of networks. Through direct links, nodes can communicate with each other in one hop. (a) Single-task 
Network, (b) Multi-task network, (c) Clustered multi-task network 

on how the parameter vectors vv ; * are related across nodes: 

« Single-task networks : All nodes in the network have to estimate the same parameter vector w£. That is, in this case we 
have that 


w£ = w*, Vfc € 1,2,..., N (2) 

• Multi-task networks : Each node k in the network has to determine its own optimum parameter vector, w k . However, it is 
assumed that similarities and relationships exist among the parameters of neighboring nodes, which we denote by writing 

w* k ~ w*, if l € A/fc ( 3 ) 

The sign ~ represents a similarity relationship in some sense, and its meaning will become clear soon once we introduce 
expression (8) and (9) further ahead. There are many situations in practice where the objective parameters are not 
identical across clusters but have inherent relationships. It is therefore beneficial to exploit these relationships to enhance 
performance. Here we focus on promoting the similarity of objective parameter vectors via their distance to each other. 

• Clustered Multi-task Networks'. Nodes are grouped into Q clusters, and there is one task per cluster. The optimum parameter 
vectors are only constrained to be equal within each cluster. The optimum parameter vectors are only constrained to be 
equal within each cluster, but similarities between neighboring clusters are allowed to exist, namely, 

w fc = W C > whenever k £ C q 

(4) 

w c p ~ w c, > ^ C p ,C q are connected 

where p and q denote two cluster indexes. We say that two clusters C p and C v are connected if there exists at least one 
edge linking a node from one cluster to a node in the other cluster. 

One can observe that the single-task and multi-task networks are particular cases of the clustered multi-task network. In the 
case where all nodes are clustered together, the clustered multi-task network reduces to the single-task network, on the other 
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hand, in the case where each cluster only involves one node, the clustered multi-task network becomes a multi-task network. 
Building on the literature on diffusion strategies for single-task networks, we shall now generalize its usage and analysis for 
distributed learning over clustered multi-task networks. These results will also be applicable to multi-task networks by setting 
the number of clusters equal to the number of nodes. 


III. Problem Formulation 


In clustered multitask networks the nodes that are grouped into cluster estimate the same coefficient vector. Thus, consider the 
cluster C{k) to which node k belongs. Under certain settings, in order to provide independence from the input data correlation 
statistics, we introduce normalized updates with respect to the input regressor at each node Ufc(ro). A local cost function, 
Jfc(wc(fc)), is associated with node k and it is assumed that the Hessian matrix of the cost function is positive semi-definite. 
The local cost function Jfc(wc(fc)) is defined as 


<4(w C (k)) = E{ | 


dfc(n) -u^(n)w Cfc 

I KM II 


I 2 } 


(5) 


Depending on the application, there may be certain properties among the optimal vectors {w^,..., {w£ q }. This Mutual 
information among tasks could be used to improve the estimation accuracy. Among the possible options, a simple yet effective. 
Euclidian distance based regularizer was enforced in [T8]. The squared Euclidean distance regularize!' is given as 


A(w Cfc ,w Cl ) = ||w Cfc - w Ci 


( 6 ) 


To estimate the unknown parameter vectors {w ( i.. | ,..., {w£ Q }, it was shown in [18] that the local cost (0 and the regularizer 
<[6]» can be combined at the level of each cluster. This formulation led to the following estimation problem defined in terms 
of Q Nash equilibrium problems [20], where each cluster Cj estimates w£ by minimizing the regularized cost function 

^(wc 3 ,w_ Ci ): 


min J Cj (w Cj , w_c,) 


CM 


wc,- 


with Jc j (wcj ; W-Cj) = E ^| dfc( t;^rM 2 } + , E E Pkib/c k - w Cl || 2 


(7) 


for j = 1 We,- (n) is the parameter vector associated with cluster C t , ?/ > 0 is a regularization parameter, and the 

symbol \ is the set difference. Note that we have kept the notation w c k in above equation to make the role of the regularization 
term clearer, even though we have w c(k) = wg. for all k in C r The notation w._e, denotes the collection of weight vectors 
estimated by the other clusters, that is, w -Cj = {"'c,,; <1 = 1,... ,Q} — {wc ; }. The non-negative coefficients p k i aim at the 
adjusting the regularization strength. In [18]. The coefficients {pki} were chosen to satisfy the conditions: 


leAfk\C(k) 


Pki > 0,if l € Afk \c(k), 


1, and 


Pkk > 0, 


( 8 ) 


Pki = 0, otherwise 
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We impose p k i = 0 for all l ^ N k \ C(k), since nodes belonging to the same cluster estimate the same parameter vector. 

The solution for the problem V\ requires that every node in the network should have the access to the statistical moments 
and p ud k over its cluster, however, node k can only be assumed to have direct access to the information from its 
neighborhood Af k , which may include the nodes that are not part of the cluster C(k) Therefore, to enable a distributed solution 
that relies only on measured data from neighborhood, as mentioned in m- m the cost function is relaxed into following 
form: 


Jc(fc)(w*)= 51 \ 2 } + r > 55 Pfc*l|Wfc-Wi| 

i&AT k nC(k) 11 ^ lGAf k \C(k) 


+ 55 Mw fc -w?|| a 

where the coefficients ci k are non-negative and satisfy the conditions: 

N 

55 cik = 1, and ci k = 0 if k <£ Mi n C(l) 

k =1 


(9) 


( 10 ) 


and the coefficients bi k are also non-negative. 

Following the same line of reasoning from _f0], m in the single-task case, and extending the argument to problem © 
by using Nash-equilibrium properties [20], and by following same procedure mentioned in [10], |2T the following diffusion 
strategy of the adapt-then-combine (ATC) for clustered multi-task Normalized LMS (NLMS) is derived in distributed manner: 


•0fc(rc+ 1) = Wfc(n) + F ii E+ U u fc ( ri)p [dfc(n) -u^(n)w fc (n)] +/z fc r? E p fc z(w ; (n) - w k (n)) 

l£Afk\C(k) (11) 

w fc (n+l)= E aik'iPiin+l) 

i&AT k nC(k) 

By extending the above clustered multi-task diffusion strategy to data-reuse case, we can derive the following Affine projection 
algorithm (APA) [22] based clustered multi-task diffusion strategy: 

tpk(n + 1) = v/ k (n) + pV k (n) (el+ V k (n)\J k (n)) [d k (n) - u fc (n)w fc (n)] 

< +HkV E Pki(yfi(n) - Wfe(n)) (12) 

ZeA 4 \C(fc) 

w fc (n + l) = E + 

i&M k nc{k) 

where 77 denotes a regularization parameter with small positive value, e is employed to avoid the inversion of a rank deficient 
matrix U fc (n)ll£(n) and the input data matrix U k (n), desired response vector d k (n) are given as follows 



Ufc(n) 


d k (n) 

Ufc(n) = 

u k (n - 1) 

, dfc(n) = 

T—1 


u k (n - P +1) 


d k (n - P + 1) 


The clustered multi-task diffusion APA algorithm is given below: 
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Algorithm 1: Diffusion APA for clustered multi-task networks 
0 : Start Wfc(O) = 0 for all k, and repeat: 

ipk( n + 1) = Wfc(n) + /u fc U l(n) (e/ + U fc (n)Ufc(n)) 1 [d fe (n) - U fe (n)w fc (n)] 

+ PkV ^ Pki(y/i(n) - w fe (n)) 

iaM k \c(k) (14) 

w fc (n+l)= ^ a^t/j^n + l) 
l&N k DC(k) 


In a single-task network, there is a single cluster that consists of the entire set of nodes we get A4 n C(k) = A4 and 
A4 \ C(k) = 0 for all k, so that the expression (fl~4l) reduces to the diffusion adaptation strategy [TOj as described in algorithm 
2 : 

Algorithm 2: Diffusion APA for single-task networks 
0 : Start Wfc(O) = 0 for all k, and repeat: 

ipk( n + 1) = Wfc(n) + ^ fc Ufc(n) (e/ + U fc (n)UjT (n)) 1 [d fc (n) -U fc (n)w fc (n)] 

w fc (n+ 1) = ^2 a ik + 1) ^ 

M k 


In the case of multi-task network where the size of each cluster is one, we have A4 nC(k) = k and A4 \ C(k) = J\f, for 
all k. Then algorithm 1 degenerates into Algorithm 3. This is the instantaneous gradient counterpart of (1 1 4[i for each node. 


Algorithm 3: Diffusion APA for multi-task networks 
0 : Start Wfc(O) = 0 for all k, and repeat: 


w fc [n + 1) = w fc (n) + Ufe (n) (el + 14 (n) (n)) 1 [d fc (n) - U fc (n) w fc (ra)] 
+ PkV ^ Pki{vn(n) ~ Wfc(n)) 


(16) 


IV. Mean-Square Error Performance Analysis 

A. Network Global Model 

The space-time structure of the algorithm leads to challenge in the performance analysis. To proceed, first. Let us define the 
global representations as 


= col{-0 1 (n), if) 2 (n ),..., xjj N (n)}, w (n) = col{wi(n), w 2 (n),..., wjv(n)} 

U(n) = diag{Ui(n), U 2 (n),.... Ujv(n)}, d(n) = col{di(ra), d 2 (n),..., djv(n)} 
where U(n) is an NM x LN block diagonal matrix. The LN x LN diagonal matrices D and r] are defined as 


(17) 


D = diag{/iiI L , h 2 1l, • • •, Pn^l} 

V = diag{77iI L ,?7 2 I L ,.. .,r) N l L } 


( 18 ) 
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to collect the local step-sizes and regularization parameters. From the linear model of the form IB. the global model at network 
level is obtained as 


d(n) = U(n)w* + v(n) 


(19) 


where w*(n) and v(n) are global optimal weight and noise vectors given as follows 


w*(n) = col{wi(n),W 2 (n),... ,w * N (n)} 
v{n) = col{vi(n),v 2 (n),..., VAr(n)} 


( 20 ) 


To facilitate analysis, the network topology is assumed to be static (i.e. aik{n) = aik )■ This assumption does not compromise 
the algorithm derivation or its operation, and is used for analysis only. The analysis presented in [23] and [24] serves as the 
basis for this work. Using the above expressions, the global model of multi-task diffusion APA is therefore formulated as 
follows: 


w(n + l)=.4 w(n) + D U T (n) [el + U(n)U T (n)] 1 [d(n) — U(n)w(n)] -Dr/Qw(n) 


( 21 ) 


where 


A = A T ®I L 


Q = I LN — P ® li 


( 22 ) 


with <g> denoting the Kronecker product, A is the N x N symmetric matrix that defines the network topology and P is the 
TV x N asymmetric matrix that defines regularizer strength among the nodes with pkk = 1 if A4 \C(k ). Now the objective is 
to study the performance behavior of the multi-task diffusion APA governed by the form (ITil . 


B. Mean Error Behavior Analysis 

The global error vector e(n) is related to the local error vectors e/,.(n) as 


e(n) = col{ei(n),e 2 (n),... ,ejv(n)} 

By denoting w(n) = w* — w (n), the global weight error vector can be rewritten as 


where 


e(n) 


[d(n) — U(n)w(n)] 


= U(n)w(n) + v(n) = e 0 (n) + v(n) 


(23) 


(24) 


e a (n) = U(n)w(n) 


(25) 





Using these results the recursive update equation of global weight error vector can be written as 


w (n + 1 ) = A 

= A 


w 


(n) — DU 7 (n) [el + U(n)U T (n)] 1 U(n)w(n) — DU r (n) [el + U(n)U 7 (n)] 1 v(n) — Dr] Q [w(n) — w*] 


liAf^DZ(n) DrjQ w(n) — A/DU 7 (n)[el + U(n)U T (n)] 1 v(n) + .AD r/ Q w 


(26) 


where Z (n) = U T (n) [el + U(n)U 7 (n)] 1 U(n). Taking the expectation Ey\ on both sides, and using the statistical inde¬ 
pendence between w k(n) and U/ (n) (i.e., independence assumption), and recalling that V/ (n) is zero-mean i.i.d and also 
independent of U/-(n) and thus of w k(n) we can write 


E[w(n + 1)] = 


(27) 


Then, for any initial condition, in order to guarantee the stability of the multi-task diffusion APA strategy in the mean sense, 
the step size //£ has to be chosen to satisfy 


A ma .(^l[lLiv-DZ-Dr 7 Q]) < 1 (28) 

where Z = E[Z(n)], and A max (•) denotes the maximum eigen value of its argument matrix. Therefore, using the norm 
inequalities and recalling the fact that the combining matrix A is a left stochastic matrix ( i.e., block maximum norm is equal 
to one), we have 


II A[1ln - D Z D n Q] Ikoo < II [Iljv D Z - D tj Q] || 6i00 

< II [Iljv - DZ-Dty + D 77 (P <g) I/,)] ||&,oo 
Let A be the an L x L matrix, then from Gershgorin circle theorem, we have: 


(29) 


IA - ai,i\ < Y |aj,j| (30) 

jV* 

Therefore, using the above result, and recalling the fact that P is a right stochastic matrix, a sufficient condition for ( l29l ) to 
hold is to choose // such that 


0 < Hk < 


itiaXfc { A ma;c ( Z&)} -(- 2 t/ 


(31) 


where Z k = E [uf (n) [el + U fe (n)Uf (n)] U fc (n) 

multi-task diffusion APA is lower than the diffusion APA due to the presence of 77. 
In steady-state i.e., as n —»• 00 the asymptotic mean bias is given by 


. Above result clearly shows that the mean stability limit of the clustered 


lim i?[w(n)l 

n—>00 L J 



-|“ 

Iljv - A 

Iln — DZ — Dr] Q 


.AD r/ Q w 


(32) 
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C. Mean-Square Error Behavior Analysis 

The recursive update equation of weight error vector can also be rewritten as follows: 


w (n + 1) = g(n) w(n) — ^lDU r (n) [el + U(n)U T (n)] 1 v(n) + 1 


(33) 


(34) 


where 

G(n) = A 1 L n ~ D Z(n) -Dr/Q 
r = AD r/ Q w* 

Using the standard independent assumption between Ufc(n) and w/.. (n) and E[\(n)] = 0, the mean square of the weight error 
vector w (n + 1 ), weighted by any positive semi-definite matrix S that we are free to choose, satisfies the following relation: 


E\\w(n + 1)|||; = E\\v/(n)\\ 2 EJ ,, + E[v T (n) Y s (n)v(n)] + E[w T (n)] E[Q 1 (n)] S r + r T S E[Q(n)\ E[w(n)] + 


(35) 


where 


ET, =E 


g T {n)VQ(n) 


and 


= A T '£A-E[Z(n)}I)A T 'ZA-A T 'ZADE[Z(n)\ - A T Y,S-S T ?,A 

+ B[Z(n)]D^SS + S T S4D£[Z(n)] + E[V T {n) Y s U(n)] +S T SS 

Y s = [el + U(n)U T (n)] _1 U(n) D A T S AD\J T (n) [el + U(n)U T (n)] 

S = .4D r/ Q 


(36) 


(37) 


In order to study the behavior of the multi-task diffusion APA algorithm, the following moments in (135b and (1361) must be 
evaluated: 


Z{n)D A T T, ADZ(n) 

v T (n) [el + U(n)U T (n)] _i U(n) D A T S4DU T (n) [el + U(n)U T (n)] ~'v(n) 


(38) 


To extract the matrix S from the expectation terms, a weighted variance relation is introduced by using !/ N z x 1 column 
vectors: 


er=bvec{S} and cr = bvec{7?£ } (39) 

where bvec{-} denotes the block vector operator. In addition, bvec{ } is also used to recover the original matrix E from cr. 
One property of the bvec{ } operator when working with the block Kronecker product [26] is used in this work, namely. 


bvec{QEP} = (P T (g>6 Q) cr 


(40) 










where P ®b Q denotes the block Kronecker product [25], [26] of two block matrices. 

Using (l40l > to (l36l > after block vectorization, the following terms on the right side of (l36l > are given by 


to 


bvecj*4 3 £ A | = j A 7 A 1 ^ cr 

bvecj.E[Z(n)] D.4 T £.4.j = (l LN ® b E[Z(n)]^J jlLAr®bD) j A 7 ® b A 1 j cr 

bvec j«4. T £ AD E[Z(n)]^ = (E[ Z(n)] ®b Iljv) (d ® b Ilat j j-A T ®b «4. J j cr 

bvecj .A 7 Ssj = bvecjA 7 SAD^Qj 

= (0, T ®b IlJV^ (jl ®b I LN^ jD (g>b I LnJ (a? <S>b A 7 j <J 

bvec js T £ Aj = bvecj Q 1 T) D A t £ Aj 

= ^Iljv <8>b Q 1 j jlz,.iv ®b ?? j jlz,tv ®b D j ^A 7 <8>b A 7 j o - 

bvec j.E[Z(n)] DA T £sj = bvec j.E[Z(?b)] D A T £ AD r? gj 

= (Iln ®b E[Z(n)\ j (Q T ®b Iljv) (jl ®b Iln) (d ®b jA 7 <8>b Aj cr 


(41) 

(42) 

(43) 

(44) 

(45) 

(46) 


bvec js T £ AD E[Z(n)] j = bvecj g T T 7 D A T £ 4D£[Z(n)] j 

= jf?[Z(n)] © Iljv j jlLAr ®b Q 1 j jlijv ®b jo ®b oj ^A T ®b A 7 


(47) 


bvecj U T (n)Y s U(n) j = bvecj E[Z{n) D/SAD Z(n)] j 

= l?[Z(n) ®b Z(n)] ®b oj jA T ®b A 1 j cr 

bvec js r £ S j = bvecj Q 1 r) D A T £ AD r/ Q j 

= {Q T ®b Q T ) (rj ®b v) (d ®b D) [A T ®b A T ) cr 
Therefore, a linear relation between the corresponding vectors {cr, cr }is formulated by 


(48) 


(49) 


cr = Fcr 


(50) 
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where F is an L 2 N 2 x L 2 N 2 matrix and given by 


Iljv — (Iljv ®b z) (Iljv <8*6 D) — (Z 0 b Iljv) (D 0 b Iljv) 

— (Q r 0b Iljv) (»7 Iljv) (D 0 b Iljv) — (Iljv ®b Q 7 ) (l ln ®b v) (l ln <8>b D) 
+ (Iljv ®b Z) (Q T (g) fe Iljv) (*7 Iln) ( d D ) 

+ (Z 06 Iljv) (l ln ®b Q t )(i LN <8>6 j?) (D 0 b D) + n(D 0b d) 

+ {Q T 0b Q 7 ) (v ®b J?) (D 0b D) 


(a t 0b ^ T ) 


(51) 


where II = E 
-2 


Z (n) 0b Z(n) Let A v = E[v(n)v T (n)] denote a iVM x ATM diagonal matrix, whose entries are the 


variances a 2 k for k = 1, 2, • • • ,N and given by 


A v — diag{c7, ;1 \\f, (7 v 2 I m> ■ • • : °v,jvIjw} 


Using the independence assumption of noise signals, the term E[v T (n) Y s (ro) v(ri)] can be written as 

E[v T (n) Y s (n) v(n)] = Tr(AD E[&] D A T T, ) 

T 

= 7 (7 

where $ = U 7 (n) [el + U(n)U T (n)] 1 A t ,(n) [el + U(n)U r (n)] 1 U(?r) and 

7 = vec{ADE[<f>] D t A t } 

= (*4.0 A) (D 0 D)vec{£'[W T A l ,W]} 

= (A04)(D0D)E[(W t 0W t )] *y v 


with W = [el + U(n)U r (n)] 1 U(n) and = vec{A„}. 

Finally, let us define the f(r,E[\r(n)],<r) as the last three terms on the right hand side of the ( 1351 . i.e, 

/( r, E[xv(n)],er) = ||r||| + E[w T (n)] E\Q T (n )] E r + r T E E[Q(n)] £[w(n)] 
Each term can be evaluated as follows. Now, let us consider the term _E||r|||,, that can be written 

||r||! = (bwec^ABr) Qw*(w*) T Q 1 r/D A t ^ cr 


T 

= r b (r 


where 


(52) 


(53) 


(54) 


(55) 


(56) 


r & = E( y A®b A^j (D 0b D) (ry 0 b 77) (q. 0 b Q^bvec jw*(w*) T j 


(57) 
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Consider the second term E 


w T (n) Q T (ri) £ r 


that can be simplified as follows: 


E 


w T (n) C? T (n)X r 


= Tr^E rw T (n) Q 1 ( n ) 
= af (n) <x 


where 


ai(n) = (A®b Aj 


In the same way, third term E 


(l LN ®b D) (I LN ®b V) (l LN ®b Q j 
— (D ® b D) (l£jv ®b T/) ®b Qj (Z ® b l£jv) 
-(D ® b D) (77 77 ) (q ® b Qj 


bvec 


r T £ Q(n) w (n) 


can be written as follows: 


E 


r 1 £' 


<?(n)w(n) =Tr(^E £?(n)w(n) 


= <*2 (n )< 


where 


(D ( 8)5 Ila/ - Ilat )(q ®b Ilat j 

- (D ® b D) (77 <g> b Iljv) ^ Q ® b Ilat j (Iljv ®b Z 

— (D ® b D) (77 <g) b 77 ) (q (g> b Qj 

Therefore, the mean-square behavior of the multi-task diffusion APA algorithm is summarized as follows: 


c *2 (n) = (A®b AlJ 


(58) 


|w*i?[w T (n)]| (59) 


(60) 


bvec|.E[w(n)](w*) T j (61) 


E\\v/(n + l)\\l = ^||w(n)||r CT + j T cr +/(r, £[w(n)], er) 

= #|| w(n) HL + 7 T cr + + af (n) + (ra) j er 


(62) 


Therefore, the multi-task diffusion strategy presented in ( 1 1 41 is mean square stable if the matrix F is stable. Iterating the 
recursion ( l62l > starting from n = 0 , we get 


E\\v/(n + l)\\l = S||w(0)||p„+ 1(T + 7 T J2 FI(t + ^2f( r > E M n - *)L Fl<T ) (63) 

2=0 2=0 

with initial condition w(0) = w* w(0). If the matrix F is stable then the first and second terms in the above equation 
converge to a finite value as n —»• 00 . Now, let us consider the third term on the RHS of the ( |63| >. We know that E[\v(n)] is 
uniformly bounded because (127b is a BIBO stable recursion with bounded driving term A I) t) Qw*. Therefore, from ([55l) 
/(r, E[w(n — *)],F'cr) can be written as 


/(r, E[w(n — z)],FV) = + ot[(n - i) + aj(n — i) j F* er (64) 

Provided that F is stable and there exist a matrix norm, denoted by || • || p such that ||F|| P = c p < 1. Applying this norm to / 
and using the matrix norms and triangular inequality, we can write |[/(r, E[yy(n — *)], F*er) || < vc p , given v is a small positive 
constant. Therefore E\\\v(n + 1)||^ converges to a bounded value as n —> 00 , and the algorithm is said to be mean square 
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stable. 

By selecting £ = jjIln we can relate E\\w(n+ 1)||£. and .E||w(n)|| 2 as follows: 


E\\v/{n + 1)|| 2 = £||w(n)|| 2 + 7 T F"<t - £||w(0)|| 


(w *) 1 


^2f(r,E[y/(n-i)],F l (r) 


2=0 


- ^/(r,^[w(n- i -*)],FV) 


2=0 

we can rewrite the last two terms in the above equation as, 

n n— 1 


2=0 


2=0 


where 


(65) 


y^/(r, E[w(n - i)],FV) - ^/(r, E[v/(n - 1 - i)],F l <r) = if F" cr + [of (n) + af (n) + T(n)] <r (66) 


r(n) = ( a i '(n — i) + 0-2 ( n ~ *)) F* cr — ^ (n — 1 — i) + af(n — 1 — 


n- 1 -*) F ! cr 


2 = 1 


2=0 


Therefore, the recursion presented in (l62l > can be rewritten as, 


E\\vr(n + 1)||“ = E\\w(n)\\l + 7 T F"cr - £||w(0)|| 2 / i + if F" cr + [of (n) + of (n) + T(n)] 

( / (LiV)2- F J F CT 

F(n + 1) = r(n)F + [[cf(n)+cf(n)] [F - I (LJV)2 f 
with r(0) = 0 lx{LN) 2 . 

Steady-state MSD of the multi-task diffusion APA strategy is given as follows 


(67) 


( 68 ) 


lim £ l ||w(n)|B \ = 7 T cr+/(r, F;[w(oo)], cr) 

n-> oo ( 7 (l,jv)2” F j CT 


(69) 


D. New Approach to Improve the Performance of Clustered multi-task diffusion APA 

Clustered multi-task diffusion strategy presented in (ITTb has mainly 2 drawbacks 

• At time instance n, assume that the node l exhibiting poor performance over the node k. The multi-task diffusion strategy 
forces the node k to learn from node l during the adaptation step where l € Nk \ Ck■ This affects the performance in 
transient state. 

• For all l € Nk \ Ck we have v/* k ~ w* i.e, only the underlying system is same. However, the multitask diffusion strategy 
forces the node k to learn from node l even in the steady state. This hampers the steady state performance of the algorithm. 

To address these problems a control variable called similarity measure, Ski{n) is introduced to control the regularizer term 
in the multi-task diffusion strategy. At each time instance n, node k has access to its neighborhood filter coefficient vectors. 
Since the node is learning from its neighborhood filter coefficient vectors, it is reasonable to check the similarity among the 
filter coefficient vectors. The similarity measure is calculated as follows 


Su(n) 


1 + 


sign ( Ok(n) - a 2 kl {n )) 


(70) 
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where cr k {n) and crf k (n) are estimated error variances and can be calculated as 

Ok(n) = A a k (n - 1) + (1 - A) d k (n) - uj(n)w fc (n) 


(jfc Z (n) = Acr^(n-1) + (1 - A) dfc(n)-Ufc(n)w;(n) for l£j\f k \C(k) 


(71) 


and A is a positive constant with A € [0,1]. 

To explain, suppose that at index n, the node l performs better than node k, i.e., cr^(n) < a k {n). Then for node k the 
similarity measure S k i(n) = A 1 + sign(cr k (n) — crh(n )) = 1 , which implies that node k would learn the weight information 
from node l by adding the difference of their current weight vectors, i.e., [w i(n) — Wfc(n)] as a correction term to its weight 
update. On the other hand, suppose the node l does not perform better than node k, i.e., a'h (n) > a 2 k (n). Then for node k the 
similarity measure 5 k i(n ) = A 1 + sign( cr k (n) — cr^(n)) =0, which implies that node k would neglect the weight vector 
w i(n). Thus improves the convergence rate and steady state performance over the multi-task diffusion strategy presented in 

os. 

Therefore, by taking the similarity measure, S k i{n) into account the modified clustered multi-task diffusion APA is given 
below 

+ 1) = w fc (n) + p k U k (n) (si + U fe (n)U^(n)) 1 [d k (n) - Uj[(n)wfc(n)] 

+ Pk V PkM [w/(n) - w fc (n)] 

l<sN k \C(k) 

w fc (n + l)= ^2 aikiphn + 1) 

zeA4nC(fc) 

where p kl {n ) = p k i 8 k i{n). Therefore, using the above expressions, the global model of modified multi-task diffusion APA is 
formulated as follows: 


(72) 


w(n + 1 ) = A. w(n) + D U T (n) [el + U(n)U T (n)] 1 e(n) — D 77 Qs(n) w(n) 


(73) 


where 


Qs{n) = B s (n) -Ps(n) ®I L 


(74) 


and 


D s(n) = diag{(5i(n)lL(n),<5 2 (n)lL(n), • ■ ■ ,S N (n)l L (n)} 


(75) 


with 


Sk{n)= ^2 PkM)= ^2 Pklhl(n) 

l<sMk\Ck l&ATk\Ck 


(76) 


the matrix P,s(n) = P © S(n) (’©’ indicates the Hadamard product) is the N x N asymmetric matrix that defines regularizer 
strength among the nodes with 8 k {n) = 1 and P s,kk{n) = 1 if M k \ C(k) is empty. The matrices A, D, U(n) are as same as 
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the matrices defined in the network model section. Now the objective is to study the performance behavior of the multi-task 
diffusion APA governed by the form (l73l >. 


E. Mean Error Behavior Analysis 

By denoting w (n) = w* w(n) the recursive update equation of global weight error vector can be written as 

w(n + l) = «4 l LN -DZ(n) — DrjQs(n) w(n) - *4DU T (n) [el + U(n)U T (n)] 1 v(n)+ AD rj Qs{n) w* (77) 

Taking the expectation of both sides, in addition to the statistical independence between w k(n) and U k(n) (i.e., independence 
assumption) we are assuming statistical independence between w k(n) and 6ki{n) and also recalling that Vk(n) is zero-mean 
i.i.d and also independent of U k(n) and thus of w k(n) we can write 


E[w(n + 1)] = A 


Iln —DZ D r/ Qs 


i?[w(n)] + AD rj Qs w* 


(78) 


where Z = E[Z(n)] and Qs = E[Qs{n)]. The quantity Qs is given as follows: 


Qs = E[D s (n) - Pa(ra) ® I L ] = D A - - P 5 < 8 > h 


(79) 


where = S[D^], Pa = i?[Pa(n)] and 


QsAi — < 


J2 PnE[8u{n)] if i = j 

ieSSi\Ci 

Pij E[6-ij (rt)] if * 7 ^ j 

0 otherwise 


(80) 


Then, for any initial condition, in order to guarantee the stability of the modified multi-task diffusion APA strategy in the mean 
sense, if the step size chosen to satisfy 


A m ax (a\1ln - D Z D ri Qs ]) < 1 
Now using the same arguments that are used in II. B, we will have 

Knox — DZ Dr] Qa]) < || [Iln — DZ-D»)Da+Dj) (Pa ® II)] IU,c 

From Gershgorin circle theorem, a sufficient condition for (|8T1 > to hold is to choose //./,. such that 

9 

0 < p k < 


(81) 


(82) 


maxfc { Xmax (Zfc)} + 2 ? 7 maxfc(^fe) 


(83) 


where Sk = E[Sk{n)] = ^ pki E[6ki(n)]. Recalling the fact that 5u{n) is equal to either 0 or 1, we can write 0 < 

l£Af k \C k 

E[8ki{n)] < 1 that imply Sk < 1. Therefore, the presence of similarity measure, Ski{n) makes the modified multi-task 
diffusion strategy mean stability is better than the multi task diffusion strategy mentioned in (fl4l) however, lower than the 
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diffusion APA due to the presence of 77 . 

In steady-state i.e., as n —> 00 the asymptotic mean bias is given by 


lim E w(n) = 



I “ 

I LN — A 

Yln DZ — D rj Qs 


A D 77 Qs w* 


(84) 


F. Mean-Square Error Behavior Analysis 

The recursive update equation of the modified multi-task diffusion APA weight error vector can also be rewritten as 


w 


(n + 1) = Qs(n)w(n) — ADU r (n) erl + U(n)U 7 (n) l v(ri) + r d - 


(85) 


where 


Qs{n) = A |Ilat -DZ (n) -DrjQ s 
r 5 = ADrjQsw* 


( 86 ) 


In addition to standard independent assumption between U k(n) and w k(n) and E[\(n)] = 0 that was taken in II.C, here 
we assume statistical independence between 5 ki(n) and Wfc(n). Then the mean square of the weight error vector w(n + 1), 
weighted by any positive semi-definite matrix E that we are free to choose, satisfies the following relation: 


£’||w(n + 1)|||; = .E||w(n)||g E / + E[v T (njY 11 (n) v(n)] + E [w (n)] T E[gs(n)\ T 'E l r s + rjT, E[Q s (n)\ E[w(n)\ + ||r 5 | 


(87) 


where 


ETj S = E 


gj{n)vg s {n) 


= A t T,A- ZD A t T, A — A t S ADZ - A t E £7[S,s] — E[Sj] S A 
+ ZVA T '£E[Ss}+ElSj}?,AVZ + E[U T {n)Y' E \J(n)] + E[Sj E5« 


( 88 ) 


and 


1-1 


(89) 


Y s = [el + U(n)U T (n)] U(n) V A T Y, AY) U T (n) [el + U(n)U T (n)]' 

Ss = AY) r/ Qs 

Following the same procudre mentioned in II. C, to extract the matrix E from the expectation terms, a weighted variance 
relation is introduced by using L 2 N 2 x 1 column vectors: 


er=bvec{E} and crs = bvecjiJE^} 
with a linear relation between the corresponding vectors {er, erg} 


(90) 


cr s =Fs cr 


(91) 
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where F 5 is an L 2 N 2 x L 2 N 2 matrix and given by 


F 4 


I LTV — (IlJV 0b Z) (Iljv 0b D) — (Z <S>b Iljv) (D 0b Iljv) 

— {Qs ®b Iljv) (v ®b Iljv) (D 0b Iljv) — (Iljv 0b Qs ) (Iljv ®& 77 ) (Iljv 0b D) 
+ (Iljv ®b Z) (Qj 0 b Iljv) (t? ®b Iljv) (D 0b D) 

+ (Z 0 b Iljv) (Iljv 0b ) (I LN 0 b 77 ) (d 0 b d) + n(D 0 b d) 

+ (E[Qj ®b Qj])(^ 0 b^)(D 0 bD) 


(.4 T © .4 T ) 


where II = E\Z,(n) 0 b Z(n)] 

The noise term £’[v T (n) Y s (n) v(n)] = 7 T er. 

Finally, let us define the f(rs,E[\¥(n)],cr) as the last three terms on the right hand side of the ( |35| >. i.e, 


f(r s ,E[v(n)],tr) = ||r 5 |||+ E[yr(n)] T E[g s (n)] T £ r 5 + rjs E[G s (n)] £?[w(n)] 

= ( r M + a h + a h) ° 

where 


(92) 


(93) 


rb,« = fi(A0b Aj (d 0b d) (77 0b 77 ) (e Q s 0b Q,s])bvec jw*(w*) T } 


(94) 


£*< 5 , 1 ( 71 ) = (. 4 . 0 b A) 


(l LN 0b D) (Iljv 0b 77) (iLJV 0b Qb) 

— (D 0b D) (Iljv 0b 77 ) (Iljv 0b Qb) (Z 0b Iljv) 
-(D 0b D) (77 0 6 77 ) ([Q<5 0b Q 4 ]) 


bvec 


(w*£'[w r (n)]| (95) 


and 


£* 5 , 2 ( 71 ) = (.4.0b Aj 


bvec 


{^[w(n)](w*) T } (96) 


(D 0b Iljv) (77 0b Iljv) (q< 5 0b Iljv) 

~(D D) (t? ®b Ilat ) (fib 0b Iljv) (Iljv 0b Z) 

— (D 0b D) (77 0 b 77 ) (j5[Q 5 0b Q 4 ]) 

Therefore, the mean-square behavior of the modified multi-task diffusion APA algorithm is summarized as follows: 


E\\w{n + l)\\l = £ , ||w(n)||i J + j T (T+f(r s , E[v/(n)],(r) (97) 

Therefore, the modified multi-task diffusion APA strategy presented in d72l ) is mean square stable if the matrix is stable. 
Iterating the recursion d97i > starting from 71 = 0, we get 

n n 

E\\w{n + l)\\l = ^||w( 0 )||^ +v + 7 T Y^ F W + 5> E[w{n - i)],F 4 <r) (98) 

2—0 2—0 

with initial condition w(0) = w* w(0). If the matrix F,j is stable then the first and second terms in the above equation 
converge to a finite value as n — » 00 . Now, let us consider the third term on the RHS of the ( |98| >. We know that /j’[w(n)] 
is uniformly bounded because (f78l) is a BIBO stable recursion with bounded driving term AD 77 E[Qg] w*. Therefore, from 
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( l93l ) f(rs, E[w(n — *)], F^er) can be written as 


f(r s ,E[y/(n - *)], F» = (r+ aJ A {n - i) + aj 2 {n - ?)) 


(99) 


Provided that F 5 is stable and there exist a matrix norm, denoted by || • || p such that ||F <5 ||y> = c Pi ,5 < 1. Applying this norm 
to / and using the matrix norms and triangular inequality, we can write |[/‘(r < 5 , E[\r(n — i)],F£ cr) || < vc l p , given v is a small 
positive constant. Therefore E\\\v(n + 1)||;} converges to a bounded value as n —> 00 , and the algorithm is said to be mean 
square stable. 

By selecting £ = jjIln we can relate E\\\v(n+ i)\\„ and -E'||w(n)||;} as follows: 

E\\y/{n + 1)11^. = £’||w(n)||^. + 7 T F£ a — -E||w(0)||| , 

( J (X,JV) 2- F 5 J F ? ct 

n n—1 (100) 

+ o) ~ ^2f{n,E[w(n- 1 -*)],F£ a) 

2=0 2=0 

we can rewrite the last two terms in the above equation as, 

n n—1 

^2f(rs,E[\r(n-i)],¥ l s <*) ~ Y^ 5 ' E[\v{n - 1 - i)], F i <t) = F ^F”cr+ [aj^n) + aj 2 {n) + T 5 {n)] <x (101) 

2 =0 z=0 

where 

n n—1 

r s{n) = Y ( a i,s( n ~ *) + a 2 ,s( n - *)) F 5 o- - Y ( a ls(n - 1 - i) + a^ s (n - 1 - *)) F 5 er ( 102 ) 

2=1 2=0 

Therefore, the recursion presented in (|97| > can be rewritten as, 


E\\w(n + l)\\l = E\\w(n)\\l + J T F^ct --E||w(0)||^ „ ^ + r M F * a + KjW + ^W+^W] 


(Wp - F 0 

T s (n + 1) = r 5 (n)F + [a^(n) + a^ s (n)\ [F 5 - I (LA r) 2 ] 


(103) 


with r a (o) = 0 lx(Z/JV) 2 . 

Steady-state MSD of the modified multi-task diffusion APA strategy is given as follows 


lim _E||w(ra)|| 

n—> 00 


2 

( F (LiV) 2 — F «) 


IT 


J T v +/(i\ 5 ,-E[w(oo)],er) 


(104) 


V. Simulation Results 

A network consists of 9 nodes with the topology shown in Fig. 2 was considered for simulations. The nodes were divided 
into 3 clusters: C\ = { 1 , 2 , 3 },C 2 = {4,5,6}, and C 3 = {7,8,9}. First, for theoretical performance comparison purpose, we 
first considered randomly generated two dimension vectors of the form w£ fc = Wo + <5c fc wc fc where 5c 1 = 0.025, Sc 2 = —0.025 
and 5q 3 = 0.015. The input regressors Ufc(n) were taken from zero mean, Gaussian distribution with correlation matrices 
R u ,k = h, and the observation noises were i. i. d zero-mean Gaussian random variables, independent of any other signals 
with noise variance <j p = 0.001. The multi-task diffusion APA algorithm was run with different step sizes and regularization 
parameters. Regularization strength p^i was set to pki = |A4 \ C(fc )| _1 for l E A4 \ C(k), and pki = 0 for any other l. This 
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settings usually leads to asymmetrical regularization weights. The coefficient matrix C was taken to be identity matrix and the 
combiner coefficients aik were set according to Metropolis rule. 

Simulations were carried out to illustrate the performance of several learning strategies: 1) the non-cooperative APA algorithm, 
2) the multi-task algorithm (Algorithm 3), and 3) the clustered multi-task algorithm (Algorithm 1). The non-cooperative 
algorithm was obtained by assigning a cluster to each node and setting rj = 0. The multi-task algorithm was obtained by 
assigning a cluster to each node and setting rj ^ 0. Note that algorithm 2 was not considered for comparison since it is a 
single-task estimation method. Normalized MSD was taken as the performance parametric to compare the diffusion strategies. 
Projection order was taken to be 4 and the initial taps were chosen to be zero. 



Iterations (n) 


Figure 3: Comparison of transient NMSD for different step-sizes and regularization parameters. 

Secondly, the modified multi-task diffusion APA is compared with multi-task diffusion APA. For that, randomly generated 
coefficient vectors of the form w£ fc = Wq + 5c k 'Wc k with L = 256 taps length were chosen as 5c x = 0.025, Sc 2 = —0.025 and 
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Sc 3 = 0.015. The input signal vectors were taken from zero mean, Gaussian distribution with correlation statistics as shown 
in the Fig. 4, and the observation noises were i. i. d zero-mean Gaussian random variables, independent of any other signals 
with noise variances as shown in the Fig. 5. 



Figure 4: Input signal Statistics. 



Figure 5: Noise statistics. 


Projection order was taken to be 8 and the initial taps were chosen to be zero. The step-size and regularization parameters 
(/j,, 7j) were adjusted to compare the steady state MSD and convergence rate properly. Simulation results were obtained by 
averaging 50 Monte-Carlo runs. The learning curves of diffusion strategies were presented in Fig. 6. It can be observed that 
the performance of the non-cooperative strategy was poor as nodes do not collaborate for additional benefit. In the case of 
multi-task diffusion strategy the performance is improved over non-cooperative strategy due to regularization between nodes. 
The cluster information in addition to regularization among nodes in the clustered multi task results in better performance over 
the non-cooperative and multi task diffusion strategies. The extra information information in the regularization among nodes 
results in great improvement in the performance of modified clustered multi-task diffusion strategy. 

VI. Conclusions 

In this paper, we presented the diffusion APA strategies which are suitable for multi-task networks and also robust against 
the correlated input conditions. The performance analysis of the proposed multi-task diffusion APA is presented in mean and 
mean square sense. By introducing similarity measure, the modified multi-task diffusion APA algorithms is proposed to achieve 
the improved performance over the multi-task diffusion strategies existed in literature. 
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